Hypothesis

Non-linear models will be more accurate than a logistic regression model



In [1]:

    
# Load data
import pandas as pd
with open('./data_files/8lWZYw-u-yNbGBkC4B--ip77K1oVwwyZTHKLeD7rm7k.csv') as data_file:
    df = pd.read_csv(data_file)
df.head()









    Out[1]:






  
    
      
      Subject
      Id
      ConversationId
      Importance
      SentDateTime
      Body
      CcRecipients
      Sender
      ToRecipients
      FolderId
    
  
  
    
      0
      OPGIdentity Requests Quarterly Review and 6 ot...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-27T11:06:17Z
      Logocidagendaicon    Your agenda for  Mond...
      NaN
      no-reply@microsoft.com
      dastrock@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
    
    
      1
      Throttling alerts from Gateway
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-21T20:25:49Z
      The monitor that will create these alerts has ...
      NaN
      akina@microsoft.com
      msodsswat@microsoft.com;estsincident@microsoft...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
    
    
      2
      App API Scrum  Monday Series and 4 other event...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-20T11:11:30Z
      Logocidagendaicon    Your agenda for  Mond...
      NaN
      no-reply@microsoft.com
      dastrock@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
    
    
      3
      Throttling alerts from Gateway
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-17T22:39:56Z
      Description Description Description...
      NaN
      akina@microsoft.com
      msodsswat@microsoft.com;estsincident@microsoft...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
    
    
      4
      Notification AAD certificate roll in March for...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-15T19:15:21Z
      cidimage001png01D16D8184C55A30cidimage003j...
      NaN
      shiung.yong@microsoft.com
      aadpartnersnotify@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...

Comparing classification models

Do some preprocessing on the text columns (subject, body, maybe to, cc, from)
- Clean NaN's or remove rows of data with NaNs
- Do stuff the Preprocess Text Azure module does for us (stopwords, etc)
- Use scikit learn where possible
Do some feature construction using pandas & scikit learn
- On subject, to, cc, from
- Bag of words
- TF/IDF
One-Hot Encode FolderId labels into their own boolean columns (1s & 0s)
Ignore better features for now, this is good enough for comparisions
Split data into training & test sets to be used for all ensemble members
For each classifier, train a model on the training data
Evaluate performance of model on test data, compare to Logistic Regression model

Constructing Subject Feature Matrix



In [2]:

    
# Remove messages without a Subject
print df.shape
df = df.dropna(subset=['Subject'])
print df.shape









    



(10301, 10)
(10295, 10)



In [3]:

    
# Perform bag of words feature extraction
# TODO: Why are there only 3000 words in the vocabulary?
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer(stop_words='english', lowercase=True)
train_counts = count_vect.fit_transform(df['Subject'])
print 'Dimensions of vocabulary feature matrix are:'
print train_counts.shape









    



Dimensions of vocabulary feature matrix are:
(10295, 3119)



In [4]:

    
# Add TF/IDF weighting to account for lenght of documents
from sklearn.feature_extraction.text import TfidfTransformer
tfidf_transformer = TfidfTransformer()
train_tfidf = tfidf_transformer.fit_transform(train_counts)
print 'Dimensions of vocabulary feature matrix are:'
print train_tfidf.shape
print 'But, its a sparse matrix: ' + str(type(train_tfidf))









    



Dimensions of vocabulary feature matrix are:
(10295, 3119)
But, its a sparse matrix: <class 'scipy.sparse.csr.csr_matrix'>

Constructing CC, To, and From



In [5]:

    
# Merge CC, To, From into one People column
df['CcRecipients'].fillna('', inplace=True)
df['ToRecipients'].fillna('', inplace=True)
df['Sender'].fillna('', inplace=True)
df['People'] = df['Sender'] + ';' + df['CcRecipients'] + ';' + df['ToRecipients']
df.head(10)









    Out[5]:






  
    
      
      Subject
      Id
      ConversationId
      Importance
      SentDateTime
      Body
      CcRecipients
      Sender
      ToRecipients
      FolderId
      People
    
  
  
    
      0
      OPGIdentity Requests Quarterly Review and 6 ot...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-27T11:06:17Z
      Logocidagendaicon    Your agenda for  Mond...
      
      no-reply@microsoft.com
      dastrock@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      no-reply@microsoft.com;;dastrock@microsoft.com
    
    
      1
      Throttling alerts from Gateway
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-21T20:25:49Z
      The monitor that will create these alerts has ...
      
      akina@microsoft.com
      msodsswat@microsoft.com;estsincident@microsoft...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      akina@microsoft.com;;msodsswat@microsoft.com;e...
    
    
      2
      App API Scrum  Monday Series and 4 other event...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-20T11:11:30Z
      Logocidagendaicon    Your agenda for  Mond...
      
      no-reply@microsoft.com
      dastrock@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      no-reply@microsoft.com;;dastrock@microsoft.com
    
    
      3
      Throttling alerts from Gateway
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-17T22:39:56Z
      Description Description Description...
      
      akina@microsoft.com
      msodsswat@microsoft.com;estsincident@microsoft...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      akina@microsoft.com;;msodsswat@microsoft.com;e...
    
    
      4
      Notification AAD certificate roll in March for...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-15T19:15:21Z
      cidimage001png01D16D8184C55A30cidimage003j...
      
      shiung.yong@microsoft.com
      aadpartnersnotify@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      shiung.yong@microsoft.com;;aadpartnersnotify@m...
    
    
      5
      OpenID RP Certification Launch Announcement
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2017-02-14T22:40:30Z
      Part of the OpenID Foundation efforts to conti...
      oauth@microsoft.com;catpm@microsoft.com;mssts@...
      michael.jones@microsoft.com
      openid@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      michael.jones@microsoft.com;oauth@microsoft.co...
    
    
      6
      How to determine if an account is fully provis...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2016-09-06T21:31:42Z
      Hey guys    I am Anbin from OneNote team I am ...
      pthiruv@exchange.microsoft.com;wenjenc@microso...
      anbinm@microsoft.com
      msareq@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      anbinm@microsoft.com;pthiruv@exchange.microsof...
    
    
      7
      Registering map platform component as an app f...
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2016-09-01T02:04:17Z
      Hey MsaReq    I’m on the maps platform team an...
      
      icheck@microsoft.com
      msareq@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      icheck@microsoft.com;;msareq@microsoft.com
    
    
      8
      Accepted Fixing MSA Developer Requests
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2016-08-10T19:48:31Z
      
      
      wibartle@microsoft.com
      dastrock@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      wibartle@microsoft.com;;dastrock@microsoft.com
    
    
      9
      Accepted Fixing MSA Developer Requests
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      Normal
      2016-08-10T19:47:41Z
      
      
      adfrei@microsoft.com
      dastrock@microsoft.com
      AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
      adfrei@microsoft.com;;dastrock@microsoft.com



In [6]:

    
# Convert People to matrix representation
people_features = df['People'].str.get_dummies(sep=';')
print people_features.shape
people_features.head()









    



(10295, 3530)






    Out[6]:






  
    
      
      11franklinc@gmail.com
      _ram@microsoft.com
      a-amgeo@microsoft.com
      a-asokuy@microsoft.com
      a-barak@microsoft.com
      a-bewhi@microsoft.com
      a-libren@microsoft.com
      a-markr@microsoft.com
      a-midumi@microsoft.com
      a-pakhar@microsoft.com
      ...
      zideng@microsoft.com
      zihliu@microsoft.com
      zion.brewer@microsoft.com
      zizhong@microsoft.com
      zlatkom@exchange.microsoft.com
      zoinertejada@solliance.net
      zoltanp@exchange.microsoft.com
      zorauf@microsoft.com
      zsolt.zombik@zsoltzombik.com
      zunqwang@microsoft.com
    
  
  
    
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      2
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      3
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      4
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
  

5 rows × 3530 columns



In [7]:

    
# Will need to store people vocabulary for feature construction during predictions
people_vocabulary = people_features.columns
print people_vocabulary[:2]
print len(people_vocabulary)









    



Index([u'11franklinc@gmail.com', u'_ram@microsoft.com'], dtype='object')
3530



In [8]:

    
# Convert to csr_matrix and hstack with Subject feature matrix
import scipy
sparse_people_features = scipy.sparse.csr_matrix(people_features)
print people_features.shape
print sparse_people_features.shape









    



(10295, 3530)
(10295, 3530)



In [9]:

    
print sparse_people_features.shape
print train_tfidf.shape
feature_matrix = scipy.sparse.hstack([sparse_people_features, train_tfidf])
print feature_matrix.shape









    



(10295, 3530)
(10295, 3119)
(10295, 6649)

Train models & compare accuracies



In [10]:

    
# Split into test and training data sets
from sklearn.model_selection import train_test_split
labels_train, labels_test, features_train, features_test = train_test_split(df['FolderId'], feature_matrix, test_size=0.20, random_state=42)
print labels_train.shape
print labels_test.shape
print features_train.shape
print features_test.shape









    



(8236,)
(2059,)
(8236, 6649)
(2059, 6649)



In [11]:

    
# Construct a list of classifiers
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier

names = [
    "Nearest Neighbors", 
    "Linear SVM", 
    "Decision Tree", 
    "Random Forest", 
    "Neural Net", 
    "AdaBoost",
]

candidate_classifiers = [
    KNeighborsClassifier(),
    SVC(kernel='linear', C=0.025),
    DecisionTreeClassifier(max_depth=5),
    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
    MLPClassifier(alpha=1),
    AdaBoostClassifier(),
]



In [12]:

    
# Train and evaluate models, compare accuracy
from sklearn import metrics
for name, clf in zip(names, candidate_classifiers):
    model = clf.fit(features_train, labels_train)
    predictions = model.predict(features_test)
    print name + ": " + str(metrics.accuracy_score(labels_test, predictions))









    



Nearest Neighbors: 0.885381253035
Linear SVM: 0.797474502186
Decision Tree: 0.175327829043
Random Forest: 0.078678970374
Neural Net: 0.855269548324
AdaBoost: 0.0806216610005



In [13]:

    
# Construct a list of classifiers
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

dense_names = [
    "RBF SVM", 
#     "Gaussian Process", # Taking way too long
    "Naive Bayes",
#     "QDA" # Didn't work for classes with only one sample
]

candidate_dense_classifiers = [
    SVC(gamma=2, C=1),
#     GaussianProcessClassifier(1.0 * RBF(1.0), warm_start=True),
    GaussianNB(),
#     QuadraticDiscriminantAnalysis()
]



In [14]:

    
# Train and evaluate models using dense feature matrix, compare accuracy
from sklearn import metrics
dense_features_train = features_train.toarray()
dense_features_test = features_test.toarray()
for name, clf in zip (dense_names, candidate_dense_classifiers):
    model = clf.fit(dense_features_train, labels_train)
    predictions = model.predict(dense_features_test)
    print name + ": " + str(metrics.accuracy_score(labels_test, predictions))









    



RBF SVM: 0.796988829529
Naive Bayes: 0.937348227295






    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-64889003908f> in <module>()
      4 dense_features_test = features_test.toarray()
      5 for name, clf in zip (dense_names, candidate_dense_classifiers):
----> 6     model = clf.fit(dense_features_train, labels_train)
      7     predictions = model.predict(dense_features_test)
      8     print name + ": " + str(metrics.accuracy_score(labels_test, predictions))

/Users/strockis/Source/miniconda2/envs/smart-sort/lib/python2.7/site-packages/sklearn/discriminant_analysis.pyc in fit(self, X, y, store_covariances, tol)
    687             if len(Xg) == 1:
    688                 raise ValueError('y has only 1 sample in class %s, covariance '
--> 689                                  'is ill defined.' % str(self.classes_[ind]))
    690             Xgc = Xg - meang
    691             # Xgc = U * S * V.T

ValueError: y has only 1 sample in class AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZmM4YzBjOAAuAAAAAACZhatKmZhBQaIh_GuBK5qjAQALOo6CFxH4Rb3A38IMpKY5AAAFEg0eAAA=, covariance is ill defined.

Conclusions

Models which probably deserve more investigation & tuning (in order):
- Multiple logistic regression
- Naive Bayes
- Nearest neighbors
- Neural networks
Decision trees don't seem to perform well at all (could be my fault though?)
Support vector machines are close, but significantly worse than the above
Next steps: focus on quality of feature construction



In [ ]:

	Subject	Id	ConversationId	Importance	SentDateTime	Body	CcRecipients	Sender	ToRecipients	FolderId
0	OPGIdentity Requests Quarterly Review and 6 ot...	AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...	AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...	Normal	2017-02-27T11:06:17Z	Logocidagendaicon Your agenda for Mond...	NaN	no-reply@microsoft.com	dastrock@microsoft.com	AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
1	Throttling alerts from Gateway	AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...	AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...	Normal	2017-02-21T20:25:49Z	The monitor that will create these alerts has ...	NaN	akina@microsoft.com	msodsswat@microsoft.com;estsincident@microsoft...	AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
2	App API Scrum Monday Series and 4 other event...	AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...	AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...	Normal	2017-02-20T11:11:30Z	Logocidagendaicon Your agenda for Mond...	NaN	no-reply@microsoft.com	dastrock@microsoft.com	AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
3	Throttling alerts from Gateway	AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...	AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...	Normal	2017-02-17T22:39:56Z	Description Description Description...	NaN	akina@microsoft.com	msodsswat@microsoft.com;estsincident@microsoft...	AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...
4	Notification AAD certificate roll in March for...	AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...	AAQkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...	Normal	2017-02-15T19:15:21Z	cidimage001png01D16D8184C55A30cidimage003j...	NaN	shiung.yong@microsoft.com	aadpartnersnotify@microsoft.com	AAMkADNlYWY3MWVjLTMyYjgtNDg1Ny1hZTk4LWFkZGEyZm...

	11franklinc@gmail.com	_ram@microsoft.com	a-amgeo@microsoft.com	a-asokuy@microsoft.com	a-barak@microsoft.com	a-bewhi@microsoft.com	a-libren@microsoft.com	a-markr@microsoft.com	a-midumi@microsoft.com	a-pakhar@microsoft.com	...	zideng@microsoft.com	zihliu@microsoft.com	zion.brewer@microsoft.com	zizhong@microsoft.com	zlatkom@exchange.microsoft.com	zoinertejada@solliance.net	zoltanp@exchange.microsoft.com	zorauf@microsoft.com	zsolt.zombik@zsoltzombik.com	zunqwang@microsoft.com
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

	11franklinc@gmail.com	_ram@microsoft.com	a-amgeo@microsoft.com	a-asokuy@microsoft.com	a-barak@microsoft.com	a-bewhi@microsoft.com	a-libren@microsoft.com	a-markr@microsoft.com	a-midumi@microsoft.com	a-pakhar@microsoft.com	...	zideng@microsoft.com	zihliu@microsoft.com	zion.brewer@microsoft.com	zizhong@microsoft.com	zlatkom@exchange.microsoft.com	zoinertejada@solliance.net	zoltanp@exchange.microsoft.com	zorauf@microsoft.com	zsolt.zombik@zsoltzombik.com	zunqwang@microsoft.com
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

	11franklinc@gmail.com	_ram@microsoft.com	a-amgeo@microsoft.com	a-asokuy@microsoft.com	a-barak@microsoft.com	a-bewhi@microsoft.com	a-libren@microsoft.com	a-markr@microsoft.com	a-midumi@microsoft.com	a-pakhar@microsoft.com	...	zideng@microsoft.com	zihliu@microsoft.com	zion.brewer@microsoft.com	zizhong@microsoft.com	zlatkom@exchange.microsoft.com	zoinertejada@solliance.net	zoltanp@exchange.microsoft.com	zorauf@microsoft.com	zsolt.zombik@zsoltzombik.com	zunqwang@microsoft.com
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0